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Introduction 

Prostate  cancer  is  the  most  common  non- dermatologic  cancer  in 
the  United  States  [1] .  It  can  potentially  arise  from  altered  genes 
in  multiple  information  pathways  and  followed  by  a  sequential 
process  of  mutation  along  with  the  progression  of  the  cancer. 
Therefore,  prostate  cancer  can  be  taken  as  a  variety  of  distinct 
diseases,  each  of  which  results  from  some  kind  of  perturbation  of  a 
different  information  pathway.  For  example,  LNCaP  cells  are 
androgen  responsive  that  harbor  a  functional  androgen  receptor, 
while  CL- 1  cells  are  androgen- independent  that  survive  an  extended 
period  of  androgen  deprivation  [2]  . 

Androgens  are  steroid  hormones,  synthesized  in  the  Leydig  cells 
in  testis.  By  binding  and  activating  the  androgen  receptor  (AR) 
protein,  androgens  exert  their  effects  in  vivo  in  the  development, 
differentiation,  and  function  of  male  reproductive  and  accessory 
sex  tissues  such  as  prostate.  Once  androgens  enter  the  cell  and 
bind  to  AR,  it  undergoes  a  conformation  change  in  its  ligand¬ 
binding  domain,  and  causes  the  dissociation  of  several  accessory 
proteins.  Then  the  DNA  binding  domain  of  the  AR  can  bind  to  a 
specific  sequence  called  the  androgen  responsive  element  (ARE) 
which  appears  in  promoter  area  of  many  genes  with  important 
function  in  maintaining  the  male  phenotype.  This  binding  of  AR 
dimmer  to  ARE  induces  transcriptional  activities  of  those  androgen 
responsive  genes.  Some  of  these  ARE-containing  genes  are  also 
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transcriptional  factors,  like  NKX3 . 1  [3]  . 

The  goal  of  this  project  is  to  understand  the  transcriptional 
program  behind  the  prostate  cancer  and  thus  provide  a  logical 
framework  of  finding  novel  target  genes  for  developing  therapies 
for  prostate  cancer.  To  date,  about  20  genes  have  been  confirmed  to 
be  transcriptionally  regulated  by  AR  through  AREs .  Although 
presence  of  ARE- like  sequence  near  or  in  a  gene  does  net 
necessarily  imply  that  it  is  regulated  by  AR  in  vivo,  such 
sequences,  particularly  when  found  in  regulatory  region  of  a  gene, 
can  guide  an  experimental  test  of  its  functional  relationship  to  AR. 
The  consensus  ARE  is  a  palindrome,  two  6bp  (5 ' -TGTTCT-3 ’ )  inverted 
sites  separated  by  a  3bp  spacer.  Recently,  some  AR  specific- 
regulated  genes  are  found  to  have  a  head- to- tail  combination  of  the 
hexamer  sites,  and  the  right  half  of  the  direct  repeat  is  more 
conserved  than  the  left  half.  On  the  other  hands,  the  molecular 
pathways  accounting  for  androgen  regulation  remain  incompletely 
characterized.  It  seems  rational  to  set  up  a  model  of  the  androgen 
response  program  involving  combinations  and  interactions  between 
multiple  known  and  unknown  pathways.  We  hypothesize  that  genes  and 
proteins  associated  with  key  nodes  within  and  between  special 
pathways  in  androgen  sensitive  and  independent  cells  have 
substantial  potential  as  therapeutic  targets  for  prostate  cancer. 


Key  Research  Accomplishments 

1)  Comprehensive  datasets  from  our  experiments  are  integrated  into 
robust  relational  databases,  as  hosted  and  shown  in  SBEAMS 
(http://db.systemsbiology.net/sbeams/)  and  LYNX  Signorae  Browser 
( ht  tps : / / sgb . lynxgen . com/ sgb/ sgb/ ) . 

2)  Position  weighted  matrices  (PWM)  are  generated  for  the  androgen 
responsive  element  (ARE)  based  on  known  AREs  in  human  genome, 
which  is  important  for  identifying  genes  with  AREs  and/or  other 
known  TFBSs  involved  in  androgen  response  program 

3)  A  set  of  putative  AREs  are  screened  by  using  Mogul  on  5k 
upstream  region  of  each  gene  which  is  significantly  up-regulated 
according  to  our  MPSS  (massively  parallel  signature  sequencing) 
and  ICAT  (isotope-coded  affinity  tags)  experimental  data; 

4)  Androgen  receptor  (AR)  pathways  are  reconstructed  based  on  the 
data  from  our  experiments  and  other  public  data  resources,  such 
as  the  TRANS FAC  database  and  KEGG  (Kyoto  Encyclopedia  of  Genes 
and  Genomes)  pathway  database.  These  pathways  are  visualized  by 
using  both  Cytoscape  and  BioTapestry  tools,  providing  an 
approach  to  identify  possible  genes  and  proteins  which  are 
associated  with  key  nodes  within  and  between  special  pathways  in 
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androgen  sensitive  and  independent  cells  and  thus  are 
substantial  potential  as  therapeutic  targets  for  prostate  cancer; 

5)  Comprehensive  datasets  from  our  experiments  are  integrated  into 
a  robust  relational  database,  which  is  also  hosted  in  SBEAMS 
(http://db.systemsbiology.net/  sbeatns/)  and  LYNX  Signome  Browser 
(https://sgb.lynxgen.com/sgb/sgb/)  for  shared  access; 

6)  49  tissue-specific  MPSS  datasets  are  integrated  to  analyze 
tissue-specificity,  and  various  tissue-specific  genes  are 
identified  in  terms  of  Unigene  Cluster  ID,  including  those  for 
prostate,  ovary,  liver  and  mammary  gland.  Tissue-specificity  is 
also  analyzed  by  homolog  comparison  of  human  with  mouse,  based 
on  public  mouse  MPSS  data  sources; 

7)  Secretory  prediction,  digested  peptides,  and  N-glycosylation 
sites  are  calculated  for  all  human  Refseq  proteins,  which  is 
useful  for  further  data  analysis  and  experiment; 

8)  Some  potential  biomarkers  are  identified  for  further  evaluation 
and  validation.  This  might  potentially  improve  the  prediction 
and  diagnoses  of  prostate  cancer; 

9)  A  windows  version  of  stand-alone  siRNA  designer  is  programmed 
and  tested.  This  program  can  be  used  to  scan  the  length  of  the 
target  gene  for  candidate  siRNAs  that  satisfy  the  user-specified 
rules.  For  each  siRNA  candidate,  a  set  of  deltaG  values  and 
score  is  calculated.  Top  siRNA  candidates  are  then  selected  for 
Blast.  Blast  files  are  parsed  and  blast  summary  is  reported  in  a 
summary  file  along  with  siRNA  candidate  hybrid  Melting 
Temperature.  NCBI -Blast,  WU-Blast  and/or  Fasta  searches  can  be 
chosen  as  blast  methods  for  siRNA  candidates. 


Reportable  Outcomes 

Model  for  disease-perturbed  network  is  set  up.  Algorithms  are 
designed  and  implemented  to  calculate  perturbation  networks  between 
two  stages  of  prostate  cancer  progression.  Gene  information  from 
314  BioCarta  pathways  and  155  KEGG  pathways  is  extracted.  37 
BioCarta  and  14  KEGG  pathways  are  identified  up-regulated,  and  23 
BioCarta  and  22  KEGG  pathways  are  found  down- regulated,  in  LNCaP 
cells  versus  CL1  cells,  based  on  MPSS  data.  This  is  a  significant 
step  to  understanding  prostate  cancer  progression  by  means  of 
systems  approach. 


Conclusions 


5 


The  systems  biology  provides  a  new  powerful  approach  to 
identifying  AR  pathways.  These  pathways  are  interesting  for  studies 
on  prostate  cancer  because  the  disease  process  should  be  reflected 
in  disease -perturbed  protein  and  gene  regulatory  networks  [4] . 


Abbreviations 


AR  Androgen  Receptor 

ARE  Androgen  Receptor  Element 

ICAT  isotope-Coded  Affinity  Tags 

MPSS  Massively  Parallel  Signature  Sequencing 

PWM  Position  Weighted  Matrix 

SBEAMS  Systems  Biology  Experiment  Analysis  System 
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Evidence  for  the  Presence  of  Disease-Perturbed  Networks  in 
Prostate  Cancer  Cells  by  Genomic  and  Proteomic  Analyses: 

A  Systems  Approach  to  Disease 

Biaoyang  Lin,'  James  T.  White,'  Wei  Lu,'  Tao  Xie,'  Angelita  G.  Utleg,'  Xiaowei  Yan,'  Eugene  C.  Yi,' 
Paul  Shannon,'  Irina  Khrebtukova,3  Paul  H,  Lange,"  David  R.  Goodlett,'  Daixing  Zhou,' 

Thomas  J.  Vasicek,  and  Leroy  Hood' 

Institute  for  Systems  Biology:  ’Department  of  Urology.  University  of  Washington.  Seattle.  Washington:  and 
'Lynx  Therapeutics.  Inc-  Hayward.  California 


Abstract 

Prostate  cancer  is  initially  responsive  to  androgen  ablation 
therapy  and  progresses  to  androgen-unresponsive  states  that 
are  refractory  to  treatment.  The  mechanism  of  this  transition  is 
unknown.  A  systems  approach  to  disease  begins  with  the 
quantitative  delineation  of  the  informational  elements  (mRNAs 
and  proteins)  in  various  disease  states.  We  employed  two 
recently  developed  high-throughput  technologies,  massively 
parallel  signature  sequencing  (MPSS)  and  isotope-coded 
affinity  tag,  to  gain  a  comprehensive  picture  of  the  changes  in 
mRNA  levels  and  more  restricted  analysis  of  protein  levels, 
respectively,  during  the  transition  from  androgen-dependent 
LNCaP  (model  for  early-stage  prostate  cancer)  to  androgen- 
independent  CL1  cells  (model  for  late-stage  prostate  cancer). 
We  sequenced  >5  million  MPSS  signatures,  obtained  >142,000 
tandem  mass  spectra,  and  built  comprehensive  MPSS  and 
proteomic  databases.  The  integrated  mRNA  and  protein 
expression  data  revealed  underlying  functional  differences 
between  androgen-dependent  and  androgen-independent 
prostate  cancer  cells.  The  high  sensitivity  of  MPSS  enabled  us 
to  identify  virtually  all  of  the  expressed  transcripts  and  to 
quantify  the  changes  in  gene  expression  between  these  two  cell 
states,  including  functionally  important  low-abundance 
mRNAs,  such  as  those  encoding  transcription  factors  and 
signal  transduction  molecules.  These  data  enable  us  to  map  the 
differences  onto  extant  physiologic  networks,  creating  pertur¬ 
bation  networks  that  reflect  prostate  cancer  progression.  We 
found  37  BioCarta  and  14  Kyoto  Encyclopedia  of  Genes  and 
Genomes  pathways  that  are  up-regulated  and  23  BioCarta  and 
22  Kyoto  Encyclopedia  of  Genes  and  Genomes  pathways  that 
are  down-regulated  in  J-NCaP  cells  versus  CL1  cells.  Our  efforts 
represent  a  significant  step  toward  a  systems  approach  to 
understanding  prostate  cancer  progression.  (Cancer  Res  2005; 
65(8):  3081-91) 

Introduction 

Prostate  cancer  is  the  most  common  nonderniatologic  cancer  in 
the  United  States  (1).  Initially,  its  growth  is  androgen  dependent; 
early-stage  therapies,  including  chemical  and  surgical  castration, 
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kill  cancerous  cells  by  androgen  deprivation.  Although  such 
therapies  produce  tumor  regression,  they  eventually  fail  because 
most  prostate  carcinomas  become  androgen  independent  (2).  To 
improve  the  efficacy  of  prostate  cancer  therapy,  it  is  necessary  to 
understand  the  molecular  mechanisms  underlying  the  transition 
from  androgen  dependence  to  androgen  independence. 

The  transition  from  androgen-dependent  to  androgen-inde¬ 
pendent  status  likely  results  from  multiple  processes,  including 
activation  of  oncogenes,  inactivation  of  tumor  suppressor  genes, 
and  changes  in  key  components  of  signal  transduction  pathways 
and  gene  regulatory  networks.  Systems  approaches  to  biology  and 
disease  are  predicated  on  the  identification  of  the  elements  of  the 
systems,  the  delineation  of  their  interactions,  and  their  changes 
in  distinct  disease  states.  Biological  information  is  of  two  types: 
the  digital  information  of  the  genome  (e.g„  genes  and  cfs-control 
elements)  and  environmental  cues.  Normal  protein  and  gene 
regulatory  networks  may  be  perturbed  by  disease,  through 
genetic  and/or  environmental  perturbations,  and  understanding 
these  differences  lies  at  the  heart  of  systems  approaches  to 
disease.  Disease-perturbed  networks  initiate  altered  responses 
that  bring  about  pathologic  phenotypes,  suclt  as  the  invasiveness 
of  cancer  ceils. 

To  map  network  perturbations  in  cancer  initiation  anti 
progression,  one  must  measure  changes  in  expression  levels  of 
virtually  all  transcripts.  Certain  low-abundance  transcripts,  such  as 
those  encoding  transcription  factors  and  signal  transducers,  wield 
significant  regulatory  influences  in  spite  of  the  fact  they  may  be 
present  in  the  cell  at  very  low  copy  numbers.  Differential  display  (3) 
or  cDNA  microarrays  (4,  5)  have  been  used  to  profile  changes  in 
gene  expression  during  the  androgen-dependent  to  androgen- 
independent  transition;  however,  those  technologies  can  identify 
only  a  limited  number  of  more  abundant  mRNAs,  and  they  miss 
many  low-abundance  mRNAs  due  to  their  low  detection  sensitiv¬ 
ities.  Massively  parallel  signature  sequencing  (MPSS),  a  recently- 
introduced  method,  allows  20-nucleotide  signature  sequences  to  be 
determined  in  parallel  for  >1,000,000  DMA  sequences  from  an 
individual  cDNA  library’  or  ceil  state  (6).  The  frequency  of  each 
MPSS  signature  was  calculated  for  each  sample  and  represented  in 
transcripts  per  million  (tpm).  MPSS  technology  allows  identifica¬ 
tion  and  cataloging  of  almost  ail  mRNAs,  even  those  with  one  or  a 
few  transcripts  per  ceil.  Differentially  expressed  genes  thus 
identified  can  be  mapped  onto  cellular  networks  to  provide  a 
systemic  understanding  of  changes  in  cellular  state. 

/Although  transcriptomc  (mRNA  levels)  differences  are  easier  to 
study  than  proteome  (protein  levels)  differences,  cellular  functions 
are  usually  performed  by  proteins.  RNA  expression  profiling  studies 
do  not  address  how  the  encoded  proteins  function  biologically,  and 
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transcript  abundance  levels  do  not  always  correlate  with  protein 
abundance  levels  (7).  We  therefore  complemented  our  mRNA 
expression  profiling  with  a  more  limited  protein  profiling  by  using 
isotope-coded  affinity  tags  (1CAT)  coupled  with  tandem  mass 
spectrometry  (MS/MS;  ref.  8). 

The  LNCaP  cell  line  is  a  widely  used  androgen-sensitive  model 
for  early-stage  prostate  cancer  from  which  androgen-independent 
sublines  have  been  generated  {4,  5,  9).  The  cells  of  one  such 
variant,  CL1,  in  contrast  to  their  LNCaP  progenitors,  are  highly 
tumorigenic  and  exhibit  invasive  and  metastatic  characteristics  in 
intact  and  castrated  mice  {9,  10).  Thus,  CL1  cells  model  iate-stage 
prostate  cancer.  MPSS  and  ICAT  data  extracted  from  these  model 
cell  lines  can  be  validated  by  real-time  reverse  transcription-PCR 
(RT-PCR)  or  Western  blot  analysis  in  more  relevant  biological 
models  (tumor  xenografts)  and  in  tumor  biopsies. 

We  conducted  a  MPSS  analysis  of  ~5  million  signatures  for 
the  androgen-dependent  LNCaP  cell  line  and  its  androgen- 
independent  derivative  CL1.  Our  database  offers  the  first 
comprehensive  view  of  the  digital  transcriptomes  of  two  states 
of  prostate  cancer  cells  and  allows  us  to  explore  the  cellular 
pathways  perturbed  during  the  transition  from  androgen- 
dependent  to  androgen-independent  growth.  We  additionally 
compared  protein  expression  profiles  between  LNCaP  and  CLI 
cells  using  ICAT-MS/MS  technology.  These  are  the  first  steps 
toward  a  systems  approach  to  disease  through  an  integrative, 
systemic  understanding  of  prostate  cancer  progression  at  the 
mRNA,  protein,  and  network  levels. 

Materials  and  Methods 

Massively  parallel  signature  sequencing  analysis.  LNCaP  and  CLI 
cells  were  grown  as  described  by  Tso  et  al.  (10).  MPSS  cDNA  libraries  were 
constmcted,  and  individual  cDNA  sequences  were  amplified,  attached  to 
individual  beads,  and  sequenced  as  described  elsewhere  (6).  The  resulting 
signatures,  generally  20  bases  long,  were  annotated  using  the  then  most 
recently  annotated  human  genome  sequence  {Human  Genome  Release 
hgl6,  released  in  November  2003)  and  the  human  Unigene  {Unigene  Build 
171.  released  in  July  2004)  according  to  a  previously  published  method  (11). 
We  considered  only  100%  matches  between  a  MPSS  signature  and  a  genome 
signature.  We  also  excluded  those  signatures  that  expressed  at  <3  tpm  in 
both  LNCaP  and  CLI  libraries,  as  they  might  not  be  reliably  detected  (12). 
Additionally,  we  classified  cDNA  signatures  by  their  positions  relative  to 
polyadenylation  signals  and  polyadenylic  acid  [po!y(A)]  tails  and  by  their 
orientation  relative  to  the  S'-3'  orientation  of  source  mRNA  The  Z-test 
(13,  14)  was  used  to  calculate  Ps  for  comparison  of  gene  expression  leveis 
between  the  ceil  lines. 

Isotope-coded  affinity  tag  analysis.  ICAT  reagents  were  purchased 
from  Applied  Biosystems,  Inc  (Foster  City,  CA)  Fractionation  of  cells  into 
cytosolic,  microsomal,  and  nuclear  fractions  (15),  as  well  as  ICAT  labeling 
MS/MS,  and  data  analyses,  were  done  as  described  by  Han  et  ai.  (15).  In 
addition,  probability  score  analysis  (16)  and  Automated  Statistical  Analysis 
on  Protein  Ratio  (17)  were  used  to  assess  the  quality  of  MS  spectra  and  to 
calculate  protein  ratios  from  multiple  peptide  ratios.  Descriptions  of  these 
software  tools  are  available  at  http://regis.systemsbiology.net/software.  To 
compare  protein  and  mRNA  expression  levels,  the  Unigene  numbers  of  the 
differentially  expressed  proteins  were  used  to  find  MPSS  signatures  and 
their  expression  levels  in  tpm.  If  one  Unigene  had  more  titan  one  MPSS 
signature  likely  due  to  alternative  terminations,  the  average  tpm  of  all 
signatures  was  taken. 

Real-time  reverse  transcription-PCR.  All  primers  were  designed  with 
the  PR1MER3  program  (http:/ Avmv-genome.wi.mit.edu/cgi-bin/primer/ 
primer3_www.cgi)  and  BLAST  searched  against  the  human  cDNA  and 
expressed  sequence  tag  (EST)  database  for  uniqueness.  Primer  sequences 
and  PCR  conditions  are  available  on  request.  Real-time  PCR  was  done  on  an 


ABI  7700  machine  (Applied  Biosystems),  and  SYBR  Green  dye  (Molecular 
Probes.  Inc.,  Eugene,  OR)  was  used  as  a  reporter.  PCR  conditions  were 
designed  to  give  bands  of  the  expected  size  with  minimal  primer  dimer 
bands. 

Identification  of  perturbed  networks.  Genes  in  the  314  BioCarta  and 
IS5  Kyoto  Encyclopedia  of  Genes  and  Genomes  (KEGG)  pathways  or 
networks  (http://cgap.nci.nih.gov/Pathways/)  were  downloaded  and  com¬ 
pared  with  the  MPSS  data  using  Unigene  IDs  as  identifiers.  If  a  Unigene  ID 
or  an  Enzyme  Classification  number  corresponded  to  multiple  signatures 
potentially  due  to  multiple  alternatively  terminated  isoforms,  tile  tpm 
counts  of  the  isoforms  were  combined  and  then  subjected  to  the  Z-test 
(13,  14).  Genes  with  Ps  of  <0.001  were  considered  to  be  significantly 
differentially  expressed.  The  following  criteria  were  used  to  identify 
perturbed  networks:  a  perturbed  network  must  have  more  than  three 
genes  represented  on  our  differentially  expressed  gene  list  (P  <  0.001)  and  at 
least  50%  of  those  genes  must  be  up-regulated  (an  up-regulated  pathway)  or 
down-regulated  (a  down-regulated  pathway). 

Prediction  of  secreted  proteins.  Proteins  with  signal  peptides  (classic 
secretory  proteins)  were  predicted  using  the  same  criteria  described  by 
Chen  et  al.  (18)  with  the  SignalP  3.0  server  (http://mvw.cbs.dtu.dk/service5/ 
SignalP-3.0/)  and  the  TMHMM2.0  server.  Putatively  nonclassic  secretory 
secreted  proteins  (without  signal  peptides)  were  predicted  based  on  the 
SecretomeP  1.0  server  (http://www.cbs.dtu.dk/services/SecretomeP-I.O/) 
and  required  an  odds  ratio  score  of  >3.0. 

Results 

Massively  parallel  signature  sequencing  analyses  of  the 
androgen-dependent  LNCaP  cell  line  and  its  androgen- 
independent  variant  CLI.  Using  MPSS  technology,  we  sequenced 
2.22  million  signature  sequences  for  LNCaP  ceils  and  2.96  million 
for  CLI  cells.  We  identified  a  total  of  19,595  unique  transcript 
signatures  expressed  at  levels  >3  tpm  in  at  least  one  of  the  samples. 
The  signatures  were  classified  into  three  major  categories:  1,093 
signatures  matched  repeat  sequences,  15,541  signatures  matched 
unique  cDNAs  or  ESTs,  and  2,961  signatures  had  no  matches  to  any 
cDNA  or  EST  sequences  (but  did  match  genomic  sequences).  The 
last  category  included  sequences  failing  into  one  of  three  different 
categories:  signatures  representing  new  transcripts  yet  to  be 
defined,  signatures  representing  polymorphisms  in  cDNA  sequen¬ 
ces  (a  match  of  a  MPSS  sequence  to  cDNA  or  EST  sequences 
requires  100%  sequence  identity),  or  errors  in  the  MPSS  reads. 
Transcript  tags  with  matches  to  a  cDNA  or  EST  sequence  were 
further  classified  based  on  the  signatures'  relative  orientation  to 
transcription  direction  and  their  position  relative  to  a  polyadeny¬ 
lation  site  and/or  poly(A)  tail.  We  also  built  a  searchable  MySQL 
database  (http://www.mysql.com)  containing  the  expression  levels 
(tpm),  the  genomic  locations  of  the  MPSS  sequences,  the  cDNAs  or 
EST  matches,  and  the  classification  of  each  signature.  A  detailed 
description  of  the  schema  for  classification  is  available  in 
Supplementary  Table  SI.  A  snapshot  of  a  representative  data 
query  is  shown  in  Supplementary  Fig.  SI. 

We  first  restricted  our  analysis  to  those  MPSS  signatures 
corresponding  to  cDNAs  with  poly(A)  tails  and/or  polyadenylation 
sites,  so  that  corresponding  genes  could  be  conclusively  identified. 
We  used  the  2-test  (13,  14)  to  compare  differential  gene  expression 
between  LNCaP  and  CLI  cells.  Using  very  stringent  Ps  (<0.001),  we 
identified  2,088  MPSS  signatures  (corresponding  to  1,987  unique 
genes,  as  some  genes  have  two  or  more  MPSS  signatures  due  to 
alternative  uses  of  polyadenylation  sites)  with  significant  differen¬ 
tia]  expression.  Of  these,  1,011  signatures  (965  genes)  were 
overexpressed  in  CLI  cells  and  1,077  signatures  (1,022  genes)  were 
overexpressed  in  LNCaP  cells  (Supplementary  Table  S2).  The  2- 
score  is  related  to  mRNA  abundance  in  the  library.  For  example. 
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using  a  cutoff  P  of  <0.001  in  our  data  set,  the  expression  level  in 
tpm  changed  from  0  to  26  tpm  for  the  most  lowly  expressed 
transcript  (>26-fold)  but  changed  from  7,591  and  11,206  tpm  for  the 
most  highly  expressed  transcript  (1.48-fold). 

We  randomly  selected  nine  genes  from  the  1,987  differentially 
expressed  genes  identified  by  our  MPSS  analysis  and  compared 
their  changes  in  expression  levels  with  those  obtained  by 
quantitative  real-time  RT-PCR  techniques.  We  showed  that  the 
expression  levels  of  these  nine  genes  changed  in  the  same  direction 
(Table  1).  The  MPSS  expression  profiling  data  were  also  consistent 
with  the  available  published  data.  For  example,  using  RT-PCR,  Patel 
et  ai.  (9)  showed  that  CL1  tumors  express  barely  detectable 
prostate-specific  antigen  (PSA)  and  androgen  receptor  mRNAs 
compared  with  LNCaP  cells.  Our  MPSS  results  indicated  that 
LNCaP  cells  expressed  584  tpm  of  androgen  receptor  and  841  tpm 
of  PSA:  CL1  cells  did  not  express  either  androgen  receptor  or  PSA 
(0  tpm  in  both  cases).  Freedland  et  al.  found  that  CD  10  expression 
was  lost  in  CL1  cells  compared  with  LNCaP  cells  (19):  likewise,  we 
found  that  CD10  was  expressed  at  0  tpm  in  CL1  cells  but  at  56  tpm 
in  LNCaP  cells.  Using  cDNA  microarrays,  Vaarala  et  al.  (4) 
compared  LNCaP  cells  and  another  androgen-independent  variant, 
non-PSA-producing  LNCaP  line,  which  is  similar  to  CL1,  and 
identified  a  total  of  56  differentially  expressed  genes.  We  found  that 
the  expression  levels  of  these  56  genes  changed  in  the  same 
direction  (concordant)  between  LNCaP  and  CL1  cells  and  between 
LNCaP  and  non-PSA-producing  LNCaP  cells  (data  not  shown).  This 
identification  of  1,987  versus  56  differentially  expressed  genes, 
respectively,  underscores  the  striking  differences  in  sensitivity 
between  MPSS  and  cDNA  microarray  techniques. 

To  compare  the  sensitivity  of  the  MPSS  and  cDNA  microarray 
procedures,  we  hybridized  cDNA  microarrays  containing  40,000 
human  cDNAs  to  the  same  LNCaP  and  CL1  RNAs  that  we  used  for 
MPSS.  Three  replicate  array  hybridizations  were  done.  MPSS 
signatures  and  array  clone  IDs  were  mapped  to  Unigene  IDs  for 
data  extraction  and  comparisons.  We  found  that  only  those  genes 
expressed  at  >40  tpm  by  MPSS  could  be  reliably  detected  as 
changing  levels  by  cDNA  microarray  hybridizations  [judged  by  an 
expression  level  twice  the  SD  of  the  background,  a  standard  cutoff 
value  for  microarray  data  analysis  (data  not  shown)].  This 
observation  is  consistent  with  the  33  to  60  tpm  sensitivity  of 
microarrays  estimated  from  the  experiment  of  Hill  et  al.  (20),  in 


which  known  concentrations  of  synthetic  transcripts  were  added. 
In  LNCaP  and  CL1  cells,  -68.75%  (13,471  of  19,595)  of  MPSS 
signatures  (>3  tpm)  were  expressed  at  a  level  below  40  tpm; 
changes  in  the  levels  of  these  genes  will  be  missed  by  microarray 
methods.  Many  attempts  have  been  made  to  increase  the 
sensitivity  of  DNA  array  technology  (21, 22).  We  have  not  compared 
these  new  improvements  against  MPSS,  but  it  is  clear  that  there 
will  still  be  significant  differences  in  the  levels  of  change  that  can 
be  detected. 

Serial  analysis  of  gene  expression  (SAGE;  ref.  23)  is  another 
technology  for  gene  expression  profiling;  like  MPSS,  it  is  digital  and 
can  generate  a  large  number  of  signature  sequences.  However, 
MPSS  (~l  million  signatures  per  sample,)  can  achieve  a  much 
deeper  coverage  than  SAGE  (typically  -  10,000-100,000  signatures 
sequenced  per  sample)  at  reasonable  cost  We  compared  our  MPSS 
data  on  LNCaP  cells  against  publicly  available  SAGE  data  on 
LNCaP  cells  (National  Center  for  Biotechnology  Information  SAGE 
database)  through  common  Unigene  IDs.  The  SAGE  library 
GSM724  (total  SAGE  tags  sequenced:  22,721;  ref  24)  is  derived 
from  LNCaP  cells  with  an  inactivated  PTEN  gene;  it  is  the  SAGE 
library  most  similar  to  our  LNCaP  cells.  Only  400  (-20%)  of  our 
1,987  significantly  differentially  expressed  genes  (P  <  0.001)  had  any 
SAGE  tag  entry  in  GSM724.  These  data  illustrate  the  importance  of 
deep  sequence  coverage  in  identifying  state  changes  in  transcripts 
expressed  at  low-abundance  levels. 

Functional  classifications  of  genes  differentially  expressed 
between  LNCaP  and  CL1  cells.  Examination  of  the  Gene  Ontology 
classification  of  our  1,987  genes  revealed  that  multiple  cellular 
processes  have  changed  during  the  transition  from  LNCaP  to  CL1 
cells.  The  completed  list,  including  Gene  Ontology  annotations,  is 
shown  in  Supplementary  Table  S2.  The  most  interesting  groups, 
categorized  by  function,  are  shown  in  Table  2. 

Nineteen  differentially  expressed  proteins  are  related  to 
apoptosis.  Twelve  of  these  are  up-regulated  in  CL1  cells,  including 
the  apoptosis  inhibitors  human  T-cell  leukemia  virus  type  I  binding 
protein  1  and  CASP8  and  FADD-like  apoptosis  regulator.  Seven  are 
down-regulated  in  CL1,  including  programmed  cell  death  8  and  5 
(apoptosis-inducing  factors)  and  BCL2-like  13  (an  apoptosis 
facilitator).  Because  CL1  cells  have  increased  expression  of 
apoptosis  inhibitors  and  decreased  expression  of  apoptosis 
inducers,  net  inhibition  of  apoptosis  may  contribute  to  their 


Table  1.  Comparison  of  MPSS  and  real-time  RT-PCR  results 


MPSS  signature  Name 


GATCTCAGTTGTAAATA  TSPAN-3 

GATCTCTTTTCAGAAGT  ITM3 

GATCCCTCCAATAAATA  PPP6C 

GATCACAATAAACGATA  PXMP2 

GATCAGATTCACGGACC  PTPRM 

GATCAACCTGTGGCTGT  GPT2 

GATCACAAAATGTTGCC  UGT2B15 

GATCACAGAAATGCATA  PKIB 

GATCCGGGATGGGAGAC  AQP3 


Genbank 
accession  no. 


BC000704 

NM_030926 

AA574270 

NM_018663 

NM_002845 

BC031364 

NM_001077 

NM_03247 1 

BM968943 


LNCR/LNCX' 
ratio  by 
real-time  PCR 

0.35 
0.28 
0.32 
0.32 
11.23 
2.02 
0.08 
0.43 
1.50 


LNCR/LNCX 
ratio  by  MPSS 


0.28 

0.16 

0.00 

0.14 

8.87 

4.03 

0.05 

0.11 

3.56 


tpm  (LNCR) 


147 

22 

0 

5 

683 

250 

36 

35 

96 


tpm  (LNCX) 


521 

140 

27 

35 

77 

62 

702 

329 

27 


*LNCR.  LNCaP  ceils  stimulated  with  androgen;  LNCX.  LNCaP  cells  starved  of  androgen. 
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Table  2.  Examples  of  differentially 

Signatures  LNCaP 

(tpm) 

expressed  genes  and  their  functional  classifications 

CL1  Description 

(tpm) 

Genbank  accession  no. 

Apoptosis  related 

GATCAAATGTGTGGCCT 

0 

3,609 

Lectin,  galactoside  binding,  soluble,  1  (galectin  1) 

BCO01693 

GAT  C ATAATGTTAACTA 

0 

14 

P'ieiomorphic  adenoma  gene-like  1  (PLAGL1) 

NM_002656 

GATCATCCAGAGGAGCT 

0 

16 

Caspase-7.  apoptosis- related  cysteine  protease 

U40281 

GATCGCGGTATTAAATC 

0 

15 

Tumor  necrosis  factor  receptor  superfamily,  member  12 

U75380 

GATCTCCTGTCCATCAG 

0 

24 

IL-l,  (1 

M 15330 

GATCCCCTTCAAGGACA 

1 

19 

Nudix  (nucleoside  diphosphate  linked  moiety  X)  type  motif  1 

NM_006024 

GATCATTGCCATCACCA 

51 

278 

EST,  highly  similar  to  CUL2_buman  Cullin  homologue  2 

AL832733 

GATCTGAAAATTCTTGG 

16 

56 

CASP8  and  FADD-like  apoptosis  regulator 

U97075 

GATCCACCTTGGCCTCC 

49 

149 

Tumor  necrosis  factor  receptor  superfamily,  member  10b 

NM_003842 

GATCATGAATGACTGAC 

118 

257 

Cytochrome  c 

BC009582 

GATCAAGTCCTTTGTGA 

299 

102 

Programmed  cell  death  8  (apoptosis-inducing  factor) 

H20713 

GATCACCAAAACCTGAT 

72 

24 

BCL2-like  13  (apoptosis  facilitator) 

BM904887 

GATCAATCTGAACTATC 

563 

146 

Apoptosis-related  protein  APR-3  (APR-3) 

NM_0160S5 

GATCCCTCTGTACAGGC 

83 

13 

Unc- 13-like  (Caenorhabditis  elegans;  UNC13),  mRNA 

NM_0O6377 

GATCTGGTTGAAAATTG 

1.006 

49 

CED-6  protein  (CED-6),  mRNA 

NM_016315 

GATCTCCCATGTTGGCT 

86 

4 

CASP2  and  R1PK1  domain  containing  adaptor  with  death  domain 

BC017042 

GATCAGAAAATCCCTCT 

27 

1 

DEAD/ii  (Asp-Glu-Aia-Asp/His)  box  polypeptide  20, 103  kDa 

BC0U556 

GATCAAGGATGAAAGCT 

50 

3 

Programmed  cell  death  2 

D20426 

GATCTGATTATTTACTT 

1,227 

321 

Programmed  cell  death  5 

NM_004708 

GATCAAGTCCTTTGTGA 

299 

102 

Programmed  ceil  death  8  (apoptosis-inducing  factor) 

NM_0O42O8 

Cyclins 

GATCCTGTCAAAATAGT 

2 

47 

MCT-1  protein  (MCT-1),  mRNA 

NM  014060 

GATCATTATATCATTGG 

3 

39 

Cyclin-dcpendcnt  kinase  inhibitor  2B  (CDKN2B) 

NMJ578487 

GATCATCAGTCACCGAA 

38 

396 

Cyclin-dependent  kinase  inhibitor  2A  (pi6) 

BM054921 

GATCGGGGGCGTAGCAT 

5 

43 

Cyclin  D1 

NM_053056 

GATCTACTCTGTATGGG 

40 

144 

Cyclin  fold  protein  1 

3G1 19256 

GATCAGCACTCTACCAC 

530 

258 

Cyclin  B1 

BM973693 

GAT  CTGGTGTAGTATAT 

210 

77 

Cyclin  G2 

BM984551 

GATCAGTACACAATGAA 

642 

224 

Cyclin  Gl, 

BC0001% 

GATCTCAGTTCTGCGTT 

918 

308 

CDK2-associated  protein  1  (CDK2AP1),  mRNA 

NM_0O4642 

GaTCCtGaGCTCCCTTT 

2,490 

650 

Cyclin  1 

BCOC0420 

GATCATGCAGTGACATA 

15 

1 

K1AA102S  protein 

AL 122055 

GATCTGTATGTGATTGG 

28 

1 

Cyclin  M3 

AA489077 

Kailikreins 

GATCCACACTGAGAGAG 

841 

0 

KLK3 

AA523902 

GATCCAGAAATAAAGTC 

385 

0 

KLK4 

AA595489 

GATCCTCCTATGTTGTT 

314 

0 

KLK2 

S39329 

CD  markers 

GATCAGAGAAGATGATA 

0 

810 

CD213a2.  IL-13  receptor,  a2 

U709S1 

gatccctaggtcttggg 

23 

161 

CD213al,  IL-13  receptor,  al 

AW874023 

GATCCACATCCTCTACA 

0 

63 

CD33,  CD33  antigen  (gp67) 

BC028152 

GATCAATAATAATGAGG 

0 

151 

CD44,  CD44  antigen 

ALS32642 

GATCCrrCAGCCTTCAG 

0 

35 

CD73,  5'-nucleotidasc,  ecto  (CD73) 

AI831695 

GATCTGGAACCTCAGCC 

1 

50 

CD49c,  integrin,  o5 

BC008786 

GATCAGAGATGCACCAC 

8 

122 

CD138,  syndecan  1 

BM974052 

GATCAAAGGTTTAAAGT 

38 

189 

CD166,  activated  leukocyte  cell  adhesion  molecule 

AL833702 

GATCAGCTGTTTGTCAT 

53 

295 

CD71,  transferrin  receptor  (p90,  CD71) 

BC001 188 

GATCGGTGCGTTCTCCT 

287 

509 

CD  107a,  lysosomal-associated  membrane  protein  1 

A152I424 

GATCTACAAAGGCCATG 

161 

681 

CD29,  integrin,  81 

NM  _00221 1 

GATCATTTATTTTAAGC 

56 

0 

CD10  (neutral  endopeptidase,  enkephalinase) 

BQ0 13520 

GATCAGTCTTTATTAAT 

150 

50 

CD107b,  lysosomal-associated  membrane  protein  2 

AI4S9107 

GATCTTGGCTGTATTTA 

84 

1,014 

CD59  antigen  pi 8-20 

NMJXJ0611 

GATCrrGTGCTGTGCTA 

408 

234 

CD9  antigen  (p24) 

NM_001769 

Transcription  factors 

GATCAAATAACAAGTCT 

0 

62 

Transcription  factor  BMAL2 

BM854818 

GATCTCTATGTTTACTT 

0 

27 

Transcription  factor  BMAL2 

BG1 63364 

GATCCTGACACATAAGA 

12 

74 

Transcription  factor  BMAL2 

BF055294 

(Continued  on  the  following  page) 
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Table  2.  Examples  of  differentially  expressed  genes  and  their  functional  classifications  (Cont'd) 

Signatures  LNCaP  CL1  Description 

(tpm)  (tpm) 

Genbank  accession  no. 

GATCATTTTGTATTAAT 

10 

61 

Transcription  factor  NRF 

BC047878 

GATCGTCTCATATTTGC 

52 

0 

Transcriptional  coactivator  tubedown-100 

NM  025085 

GATCCCCCTCTTCAATG 

0 

31 

Transcriptional  coactivator  with  PDZ-binding  motif 

4)299431 

GATCAAATGCTATTGCA 

1 

55 

Transcriptional  regulator  interacting  with 
the  PHS-bromodomain  2 

Ail 26500 

GATCTGTGACAGCAGCA 

140 

35 

Transducer  of  ERBB2,  1 

BC031406 

GAT  CAAATCTGTACAGT 

239 

23 

Transducer  of  ERBB2,  2 

AA694240 

Annexins  and  their  ligands 

GATCCTGTGCAACAAGA 

0 

69 

Annexin  A10 

BC007320 

GATCTGTGGTGGCAATG 

41 

630 

Annexin  All 

AL5767S2 

GATCAGAATCATGGTCT 

0 

1.079 

Annexin  A2 

BC001388 

GATCTCTTTGACTGCTG 

210 

S60 

Annexin  A5 

BC 001429 

GATCCAAAAACATCCTG 

83 

241 

Annexin  A6 

A! 566871 

GATCAGAAGACTTTAAT 

0 

695 

Annexin  Al 

BC001275 

GATCAGGACACTTAGCA 

0 

2.949 

S100  calcium-binding  protein  A10  (Annexin  11  ligand) 

BC015973 

Matrix  metalloproteinase 

GATCATCACAGTTTGAG 

0 

38 

MMP  10  (stromelysin  2) 

BC002591 

GATCCCAGAGAGCAGCT 

0 

108 

MMP  1  (interstitial  collagenase) 

BC0131 18 

GATCGGCCATCAAGGGA 

0 

25 

MMP  13  (coiiagenase  3) 

AI370581 

GATCTGGACCAGAGACA 

0 

10 

MMP  2  (gelatinase  A) 

BG332150 

greater  tumorigenicity.  Matrix  metalioproteinases  (MMP),  which 
degrade  extracellular  matrix  components  that  physically  impede 
cell  migration,  are  implicated  in  tumor  cell  growth,  invasion,  and 
metastasis.  Wc  found  that  MMPs  1,  2,  10,  and  13  are  significantly 
overexpressed  in  CL1  cells  (Table  2),  which  may  partially  explain 
these  cells’  aggressive  and  metastatic  behavior. 

CD  markers  are  generally  localized  at  the  cell  surface;  some  may 
be  associated  with  prostate  cancer  (25).  We  converted  all  currently 
identified  CD  markers  (CD1-CD247)  from  the  PROW  CD  index 
database  (http://www.ncbi.nlm.nih.gov/prow/guide/45277084.htm) 
to  Unigene  numbers  and  used  these  numbers  to  identify  their 
signatures  and  their  expression  levels.  We  identified  15  CD 
markers  that  are  differentially  expressed  between  LNCaP  and 
CL)  cells  (Z- score  <  0.001;  Table  2).  Eleven  CD  markers,  including 
CD213a2  and  CD213al,  which  encode  interleukin  (IL)-13  receptors 
ori  and  a2,  are  up- regulated  in  CL1  cells;  three  CD  markers,  CD9, 
CD10,  and  CD  107,  are  down- regulated  in  these  cells  (Table  2).  Six 
CD  markers  went  from  0  or  1  to  >35  tpm  (Table  2),  making  them 
good  digital  or  absolute  markers  or  therapeutic  targets.  These  data 
suggest  that  carefully  selected  CD  markers  may  be  useful  in 
following  the  progression  of  prostate  cancer  and  indeed  could 
serve  as  potential  targets  for  antibody-mediated  therapies  (25). 
Additional  functional  categories  can  be  seen  in  Supplementary 
Table  S2. 

Delineation  of  disease-perturbed  networks  in  prostate 
cancer  cells.  Genes  and  proteins  rarely  act  alone  but  rather 
generally  operate  in  networks  of  interactions.  Identifying  key  nodes 
(proteins)  in  the  disease-perturbed  networks  may  provide  insights 
into  effective  drug  targets.  Comparing  the  genes  (proteins) 
currently  available  in  the  314  BioCarta  and  155  KEGG  pathway 
or  network  (http://cgap.ncinih.gov/Pathways/)  databases  with  the 
MPSS  data  through  Unigene  IDs,  we  identified  37  BioCarta  and  14 
KEGG  pathways  that  are  up-regulated  and  23  BioCarta  and  22 
KEGG  pathways  that  are  down-regulated  in  LNCaP  cells  versus  CL1 
cells  (Table  3).  The  number  of  genes  whose  expression  patterns 


changed  in  each  pathway  is  listed  in  Table  3.  Each  gene  along  with 
its  expression  level  in  LNCaP  and  CL1  cells  is  listed  pathway  by 
pathway  in  our  database  (ftp://ftp.systemsbiology.net/pub/blin/ 
mpss).  Changes  in  these  pathways  reveal  the  underlying  phenotypic 
differences  between  LNCaP  and  CL1  cells.  For  example,  multiple 
networks  involved  in  modulating  cell  mobility,  adhesion,  and 
spreading  are  up-regulated  in  CL1  cells,  which  are  more  metastatic 
and  invasive  than  I.NCaP  cells  (Table  3).  In  the  uCalpain  and 
friends  in  cell  spread  pathway,  calpains  are  calcium-dependent 
thiol  proteases  implicated  in  cytoskeletai  rearrangements  and  cell 
migration.  During  cell  migration,  calpain  cleaves  target  proteins, 
such  as  talin,  ezrin,  and  paxiliin,  at  the  leading  edge  of  the 
membrane  while  at  the  same  time  cleaving  the  cytoplasmic  tails  of 
the  integrins  pt(a)  and  (J3(b)  to  release  adhesion  attachments  at  the 
trailing  membrane  edge.  Increased  activity  of  calpains  increases 
migration  rates  and  facilitates  cell  invasiveness  (26). 

Many  pathways  we  identified  as  perturbed  in  the  LNCaP  and 
CL1  comparison  are  interconnected  to  form  networks  (in  fact, 
there  are  probably  no  discrete  pathways,  only  networks).  For 
example,  the  insulin  signaling  pathway,  the  signal  transduction 
through  [L-l  receptor  pathway,  and  nuclear  factor-«B  (NF-kB) 
signaling  pathway  are  interconnected  through  c-Jun,  IL-1  receptor, 
and  NF-kB.  The  mapping  of  genes  onto  networks/pathways  will  be 
an  ongoing  objective  as  more  networks/pathways  become 
available.  Our  transcriptome  data  will  be  an  invaluable  resource 
in  delineating  these  relationships. 

As  gene  regulatory  networks  controlled  by  transcription  factors 
form  the  top  layer  of  the  hierarchy  that  controls  the  physiologic 
network,  we  sought  to  identify  differentially  expressed  transcrip¬ 
tion  factors.  Of  554  transcription  factors  expressed  in  LNCaP  and 
CL1  cells,  112  showed  significantly  different  levels  between  the  cell 
lines  (P  <  0.001;  Supplementary  Table  S3).  This  clearly  showed 
significant  difference  in  the  functioning  of  the  corresponding  gene 
regulatory  networks  during  the  progression  of  prostate  cancer  from 
the  early  to  late  stages. 
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Table  3.  Pathways  that  are  up-regulated  or  down-regulated  comparing  LNCaP  cells  to  CL1  cells 

Pathways  No.  gene  hits  No.  P  <  0.001  No.  P  <  0.001 

in  a  pathway  and  LNCA  >  CL1  and  LNCA  <  CL1 

No.  no  change 

Up-regulated  pathways  in  LNCaP  cells 

BioCarta  pathways 

Mechanism  of  gene  regulation  by  peroxisome 

35 

9 

2 

24 

proliferators  via  PPARct 

T-ceil  receptor  signaling  pathway 

21 

6 

2 

13 

ATM  signaling  pathway 

15 

5 

2 

8 

CARMt  and  regulation  of  the  estrogen  receptor 

18 

5 

2 

11 

HIV  type  !  Nef  negative  effector  of  Fas  and 

33 

5 

2 

26 

tumor  necrosis  factor 

Epidermal  growth  factor  signaling  pathway 

17 

5 

I 

11 

Hole  of  BRCA1,  BRCA2,  and  ATR  in  cancer  susceptibility 

16 

5 

1 

10 

Tumor  necrosis  factor  receptor  1  signaling  pathway 

17 

5 

1 

11 

Toll-like  receptor  pathway 

17 

5 

1 

11 

FAS  signaling  pathway  CD95 

17 

4 

1 

12 

Vascular  endothelial  growth  factor 

16 

4 

1 

11 

hypoxia  and  angiogenesis 

Bone  remodeling 

9 

3 

1 

5 

Estrogen  receptor-associated  degradation  ERAD  pathway 

11 

3 

1 

7 

Estrogen-responsive  protein  Efp  controls  ceil  cycle 

11 

3 

1 

7 

and  breast  tumors  growth 

Influence  of  Ras  and  Rho  proteins  on  GrS  transition 

16 

3 

1 

12 

Inhibition  of  cellular  proliferation  by  Gleevec 

13 

3 

1 

9 

Mitogen-activated  protein  kinase  inactivation 

9 

3 

1 

5 

of  SMRT  corepressor 

NF-kB  activation  by  nontypeabie  Haemophilus  influenzae 

16 

3 

1 

12 

Rb  tumor  suppressor  checkpoint  signaling  in 

10 

3 

I 

6 

response  to  DNA  damage 

Transcription  regulation  by  methyltransferase  of  CARMt 

10 

3 

1 

6 

Ceramirie  signaling  pathway 

13 

4 

0 

9 

Cystic  fibrosis  transmembrane  conductance  regulator 

7 

4 

0 

3 

and  ^2-adrenergic  receptor  pathway 

Nerve  growth  factor  pathway 

11 

4 

0 

7 

Platelet-derived  growth  factor  signaling  pathway 

16 

4 

0 

12 

Tumor  necrosis  factor  stress-related  signaling 

14 

4 

0 

10 

Activation  of  COOH-terminal  Srk  kinase  by  cyclic 

9 

3 

0 

6 

AMP-dependent  protein  kinase  inhibits 
signaling  through  the  T-cell  receptor 

AKAP95  role  in  mitosis  and  chromosome  dynamics 

11 

3 

0 

S 

Attenuation  of  GPCR  signaling 

7 

3 

0 

4 

Chaperones  modulate  1FN  signaling  pathway 

11 

3 

0 

8 

ChREBP  regulation  by  carbohydrates  and  cyclic  AMP 

12 

3 

0 

9 

Insulin-like  growth  factor-I  signaling  pathway 

11 

3 

0 

8 

Insuiin  signaling  pathway 

11 

3 

0 

8 

NF-kB  signaling  pathway 

11 

3 

0 

S 

Protein  kinase  A  at  the  centrosome 

12 

3 

0 

9 

Regulation  of  cfcl  cdkS  by  type  1  glutamate  receptors 

10 

3 

0 

7 

Role  of  mitochondria  in  apoptotic  signaling 

10 

3 

0 

7 

Signal  transduction  through  IL-I  receptor 

14 

3 

0 

11 

KEGG  pathways 

Aminosugar  metabolism 

24 

9 

4 

11 

Androgen  and  estrogen  metabolism 

37 

13 

5 

19 

Benzoate  degradation  via  hydroxylation 

5 

3 

1 

1 

C21-Steroid  hormone  metabolism 

4 

1 

0 

3 

Co-Branched  dibasic  acid  metabolism 

2 

2 

0 

0 

Carbazole  degradation 

1 

1 

0 

0 

Terpenoid  biosynthesis 

6 

4 

1 

1 

Chondroitin-heparan  sulfate  biosynthesis 

14 

8 

3 

3 

Fatty  acid  biosynthesis  (path  I) 

3 

2 

0 

I 
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Table  3.  Pathways  that  are  up-regulated  or  down-regulated  comparing 

Pathways  No.  gene  hits 

in  a  pathway 

LNCaP  cells  to  CL1  cells  (Cont’d) 

No.  P  <  0.001  No.  P  <  0.001 

and  LNCA  >  CL1  and  LNCA  <  CL1 

No.  no  change 

Fluorene  degradation 

3 

2 

0 

1 

Pentose  and  glucuronate  interconversions 

19 

9 

1 

9 

Phenylalanine,  tyrosine,  and  tryptophan  biosynthesis 

10 

5 

2 

3 

Porphyrin  and  chlorophyll  metabolism 

2S 

13 

3 

12 

1 

Streptomycin  biosynthesis 

6 

•1 

1 

Up-regulated  pathways  in  Chi  cells 

BioCarta  pathways 

Rho  cel!  motiiity  signaling  pathway 

18 

2 

6 

10 

Trefoil  factors  initiate  mucosal  healing 

14 

1 

6 

7 

Integrin  signaling  pathway 

14 

1 

5 

8 

Ca2'/calmodulin-dependent  protein  kinase  activation 

7 

1 

4 

2 

Effects  of  caicineurin  in  keratinocyte  differentiation 

9 

1 

4 

4 

Angiotensin  11-mediated  activation  of  c-Jun 

12 

1 

3 

8 

NH-j-terminal  kinase  pathway  via  Pyk2-dependent  signaling 

Bioactive  peptide-induced  signaling  pathway 

16 

1 

3 

12 

CBL-mediated  ligand-induced  down-regulation  of 

6 

1 

3 

2 

epidermal  growth  (actor  receptors 

Control  of  skeletal  rr.yogenesis  by  HDAC  calcium 

12 

1 

3 

8 

calmodulin-dependent  kinase  CaMK 

How  does  Salmonella  hijack  a  ceil 

8 

1 

3 

4 

Melanocyte  development  and  pigmentation  pathway 

4 

1 

3 

0 

Overview  of  telomerase  protein  component  gene  hTert 

7 

I 

3 

3 

transcriptional  regulation 

Reguiation  of  PGC-  fa 

9 

0 

4 

5 

ADP-ribosylation  factor 

9 

0 

3 

6 

Down-regulated  of  MTA-3  in  estrogen 

7 

0 

3 

4 

receptor-negative  breast  tumors 

Endocytotic  role  of  NDK  phosphins  and  dynamin 

7 

0 

3 

4 

Mechanism  of  protein  import  into  the  nucleus 

7 

0 

3 

4 

Nuclear  receptors  in  lipid  metabolism  and  toxicity 

7 

0 

3 

4 

Pertussis  toxin-insensitive  OCRS 

9 

0 

3 

ft 

signaling  in  macrophage 

Platelet  amyloid  precursor  protein  pathway 

5 

0 

3 

2 

Role  of  Ran  in  mitotic  spindle  regulation 

8 

0 

3 

5 

Sumoyiation  by  RanBP2  regulates 

S 

0 

3 

5 

transcriptional  repression 

uCalpain  and  friends  in  cell  spread 

5 

0 

3 

2 

KEGG  pathways 

Arginine  and  proline  metabolism 

45 

7 

16 

22 

ATP  synthesis 

31 

7 

15 

9 

Biotin  metabolism 

5 

1 

3 

1 

Blood  group  giycolipid  biosynthesis,  lactoseries 

12 

1 

6 

5 

Cyanoamino  acid  metabolism 

5 

0 

3 

2 

Ethylbenzene  degradation 

9 

1 

3 

5 

Ganglioside  biosynthesis 

16 

2 

6 

s 

Globoside  metabolism 

17 

3 

8 

6 

Glutathione  metabolism 

26 

4 

10 

12 

Glycine,  serine,  and  threonine  metabolism 

32 

6 

14 

12 

Giycosphingolipid  metabolism 

33 

6 

18 

ii 

Glycosylphosphatidyli.iositol-anchor  biosynthesis 

26 

5 

12 

9 

Glyoxylate  and  dicarboxyiate  metabolism 

9 

i 

6 

2 

Huntington's  disease 

25 

4 

10 

11 

Methane  metabolism 

9 

1 

3 

5 

O-Glycans  biosynthesis 

19 

3 

8 

s 

One-carbon  pool  by  folate 

12 

2 

8 

2 

Oxidative  phosphorylation 

93 

21 

45 

27 

Parkinson’s  disease 

30 

5 

14 

11 

Phospholipid  degradation 

21 

4 

12 

5 

Synthesis  and  degradation  of  ketone  bodies 

7 

1 

3 

3 

Urea  cycle  and  metabolism  of  amino  groups 

18 

2 

8 

8 
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As  secreted  proteins  can  readily  be  exploited  for  blood  cancer 
diagnosis  and  prognosis,  we  next  asked  how  many  of  our 
differentially  expressed  genes  encode  secreted  proteins.  We 
identified  521  signatures  belonging  to  >160  genes  potentially 
encoding  secreted  proteins  (Supplementary  Table  S6).  Among 
these,  287  (259  genes)  and  234  (201  genes)  signatures,  respectively, 
are  overexpressed  or  underexpressed  in  CL1  cells  compared  with 
LNCaP  cells.  Thus,  one  can  think  about  using  blood  diagnostics 
(changes  in  relevant  protein  concentrations)  to  follow  prostate 
cancer  progression. 

Quantitative  proteomic  analysis  of  prostate  cancer  cells.  We 
quantitatively  profiled  the  protein  expression  changes  between 
LNCaP  and  CL1  cells  using  the  ICAT-MS/MS  protocol  described 
by  Han  et  al.  (15).  We  generated  a  total  of  142,849  MS/MS,  7,282 
of  which  corresponded  to  peptides  with  a  mass  spectrum 
quality  score  P  of  >0.9  (allowing  unambiguous  identification  of 
peptides;  ref.  16).  We  obtained  quantitative  peptide  ratios  for 
4,583  peptides  corresponding  to  940  proteins.  The  number  of 
peptides  is  greater  than  the  number  of  proteins  because  (a) 
mass  spectrometry  identified  multiple  peptides  from  the  same 
protein  and  (ft)  the  ionization  step  of  mass  spectrometry 
created  different  charge  states  for  the  same  peptide.  The  protein 
ratios  were  calculated  from  multiple  peptide  ratios  using  an 
algorithm  for  the  Automated  Statistical  Analysis  on  Protein 
Ratio  (17).  In  the  end,  we  identified  82  proteins  that  are  down- 
regulated  and  108  proteins  that  are  up-regulated  by  at  least  1.8- 
fold  in  LNCaP  cells  compared  with  CL1  cells.  The  functional 
classification  of  the  proteins  identified  is  shown  in  Supplemen¬ 
tary  Table  S4. 

Fifty-four  percent  (103  of  190)  of  differentially  expressed 
proteins  identified  have  enzymatic  activity.  Many  of  the  proteins 
identified  are  involved  in  fatty  acid  and  lipid  metabolism, 
including  fatty  acid  synthase,  carnitine  palmitoyltransferase  II, 
and  propionyl  CoA  carboxylase  a  polypeptide.  Fatty  acid  and 
lipid  metabolism  is  perturbed  in  prostate  cancer  (27,  28). 
Additionally,  many  genes  involved  in  lipid  transport  were 
altered,  including  five  Annexin  family  proteins,  prosaposin,  and 
fatty  acid  binding  protein  5  (Supplementary  Table  S4).  Annexin 
Al  was  shown  to  be  overexpressed  in  non-PSA-producing 
LNCaP  cells  compared  with  PSA-producing  LNCaP  cells  (4). 
Annexin  A7  is  postulated  to  be  a  prostate  tumor  suppressor 
gene  (29).  Annexin  A2  expression  is  reduced  or  lost  in  prostate 
cancer  cells,  and  its  re-expression  inhibits  prostate  cancer  ceil 
migration  (30). 

Other  genes  we  identified  here  have  been  implicated  in 
carcinogenesis,  including  tumor  suppressor  p!6  and  insulin-like 
growth  factor-II  receptor  (27,  31).  Some  genes  have  been 
implicated  previously  in  prostate  cancer,  such  as  prostate  cancer 
overexpressed  gene  1  (POVI),  which  is  overexpressed  in  prostate 
cancer  (32),  and  51  and  al  catenin  (cadherin-associated  protein) 
and  junction  plakoglobin,  which  are  down-regulated  in  prostate 
cancer  cells  (33).  However,  the  potential  relationships  of  most  of 
the  proteins  identified  here  to  prostate  cancer  require  further 
elucidation.  For  example,  transmembrane  protein  4  (TMEM4),  a 
gene  predicted  to  encode  a  182-amino  acid  type  II  transmem- 
brane  protein,  is  down-regulated  -  2-fold  in  CL1  cells  compared 
with  LNCaP  cells.  MPSS  data  also  indicated  that  TMEM4  is 
down-regulated  -  2-fold  in  CLi  cells.  Many  type  II  transmem- 
brane  proteins,  such  as  TMPRSS2,  are  overexpressed  in  prostate 
cancer  patients  (34).  It  will  be  interesting  to  see  whether 
TMEM4  overexpression  plays  a  primary  role  in  prostate 


carcinogenesis.  We  also  identified  12  proteins  that  have  not 
been  annotated  or  functionally  characterized.  The  relationships 
between  these  novel  proteins  and  prostate  cancer  also  need 
further  study. 

Additionally,  we  sought  to  compare  the  changes  in  expression  at 
the  protein  level  in  the  two  ceil  states  with  changes  at  the  mRNA 
level.  We  converted  the  protein  IDs  and  MPSS  signatures  to 
Unigene  IDs  to  compare  the  MPSS  data  with  the  ICAT-MS/MS 
data.  We  limited  this  comparison  to  those  with  common  Unigene 
IDs  and  with  reliable  ICAT  ratios  (SD  <0.5)  and  ended  up  with  a 
subset  of  79  proteins.  Of  these,  66  genes  (83.5%)  were  concordant  in 
their  changes  in  mRNA  and  protein  levels  of  expression  and  13 
genes  (16.5%)  were  discordant  (i.e.,  having  higher  protein 
expression  but  lower  mRNA  expression  or  vice  versa).  The  scatter 
plot  of  protein/mRNA  expression  ratios  is  shown  in  Fig.  1.  There 
are  no  functional  similarities  among  the  discordant  genes.  As  these 
mRNAs  and  proteins  are  expressed  at  relatively  high  levels, 
discordance  due  to  measurement  errors  is  unlikely.  Clearly,  post- 
transcriptional  mechanism(s)  of  protein  expression  is  important, 
although  the  elucidation  of  the  specific  mechanism(s)  awaits 
farther  studies. 

Discussion 

The  systems  approach  to  disease  is  predicated  on  the  idea 
that  the  disease  process  is  reflected  in  disease-perturbed  protein 
and  gene  regulatory  networks.  Molecular  systems  biology  has 
two  important  features:  (a)  it  employs  global  analyses  where 
global  implies  studying  changes  in  transcript  or  protein  levels  as 
well  as  the  relationships  of  all  of  the  elements  in  the  system  and 
(fi)  it  integrates  different  types  of  biological  information  (single 
nucleotide  polymorphisms,  DNA,  mRNA,  protein,  protein  inter¬ 
actions,  etc.).  MPSS  is  a  powerful  and  sensitive  technology  that 
allows  deep  analysis  of  the  prostate  transcriptome.  The  MPSS 
protocol  we  used  for  this  study  relies  on  GATC  enzymatic  sites 
to  cleave  the  3'  region  of  cDNAs  to  generate  DNA  fragments  as 
substrates  for  MPSS.  cDNAs  lacking  GATC  in  their  3'  region 
would  be  excluded  from  these  analyses.  The  estimated 
percentage  of  cDNA  clones  lacking  an  appropriately  positioned 
GATC  site  is  -3%  as  calculated  from  the  Mammalian  Genome 
Collection  full-length  sequences.  Among  the  15,064  Mammalian 
Genome  Collection  sequences,  14,602  (96.93%)  sequences  have 
appropriate  GATC  sites.  The  protocol  we  used  is  also  biased 
toward  capturing  MPSS  signatures  within  500  bp  5’  of  the  poly(A) 
site.  If  the  GATC  site  is  located  beyond  500  bp  5' of  the  poly(A) 
site,  it  will  likely  be  missed  as  well.  For  example,  NKX3.1.  a 
prostate-specific  and  androgen-regulated  gene  (35,  36),  is  not 
found  in  our  MPSS  data  set  because  its  GATC  site  closest  to  the 
poly(A)  tail  (Genbank  accession  no.  AF247704)  is  2.8  kb  away. 
Recently,  a  new  protocol  that  eliminates  this  bias  was  developed 
at  Lynx.'1  We  estimated  that  LNCaP  cells  expressed  -2SO.OOO 
transcripts  per  cell.  We  obtained  -  900  ug  of  total  RNA  from  10s 
cells.  With  an  average  of  3%  polyadenyiated  RNA  and  an  average 
transcript  length  of  I  kb.  this  corresponds  to  280,000  transcripts 
per  cell.  Therefore,  with  >2  million  signatures  obtained  for  each 
cell  state  by  MPSS,  we  can  detect  transcripts  expressed  at  levels 
of  <1  transcript  per  ceil  (this  means  that  not  all  cells  express  the 
transcript). 


4  D.  2ho u.  personal  communication. 
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Figure  1.  Scatter  plot  of  the  protein  ratios 
obtained  by  ICAT  and  the  mRNA  expression  ratios 
obtained  by  MPSS.  Expression  ratios  in 
Supplementary  Table  S3  were  transformed  to 
natural  logarithms  and  then  plotted. 
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The  BioCarta  and  KEGG  databases  describe  469  protein 
pathways  or  networks  (http://cgap.nci.nih.gov/Pathways/).  We 
have  identified  37  BioCarta  and  14  KEGG  pathways  that  are  up- 
regulated  and  23  BioCarta  and  22  KEGG  pathways  that  are  down- 
regulated  in  LNCaP  cells  versus  CL1  ceils.  We  have  also  shown  that 
112  transcription  factors  change  between  these  two  disease  states, 
consistent  with  the  fact  that  several  different  gene  regulatory 
networks  are  perturbed.  These  changes  indicate  significant 
alterations  of  the  corresponding  gene  regulatory  networks.  These 
transcription  factors  include  androgen  receptor  along  with  other 
six  transcription  factors,  such  as  the  ets  homologous  factor,  a  liver- 
specific  bHLH-Zip  transcription  factor,  an  1FN  regulatory  factor, 
and  CCCTC-binding  factor  (zinc  finger  protein;  by  exploring  data 
in  Supplementary  Table  S3).  The  fascinating  question  is  which  of 
these  networks  are  directly  correlated  with  prostate  cancer 
progression  and  which  are  changed  secondarily  as  a  consequence 
of  their  connections  to  the  primary  disease  networks.  We  are 
working  on  strategies  to  distinguish  these  possibilities.  Neverthe¬ 
less,  we  can  firmly  conclude  that  the  progression  from  early-stage 
to  late-stage  prostate  cancer  as  represented  by  LNCaP  and  CL1 
cells  clearly  is  reflected  in  significant  changes  in  both  protein  and 
gene  regulatory  networks. 

In  contrast  to  the  MPSS  technolog)',  the  ICAT  technology  is  an 
immature  technology  that  cannot  now  carry  out  global  analyses 
(37).  The  integration  of  different  types  of  data  provides  powerful 
new  approaches  to  defining  more  precisely  protein  and  gene 
regulatory  networks  (38).  We  have  shown  that  the  protein  and 
RNA  expression  levels  of  66  of  79  genes  (83.5%)  were  concordant 
(i.e.,  changes  in  the  same  direction;  Supplementary  Table  S5). 
This  concordance  rate  is  higher  than  that  reported  elsewhere 


(39,  40).  Waghray  et  al.  found  that  only  8  of  25  (32%)  androgen- 
responsive  genes  in  LNCaP  cells  showed  concordance  between 
protein  levels  measured  by  two-dimensional  gels  and  MS/MS 
and  mRNA  levels  analyzed  by  SAGE  (39).  Although  genes  in 
different  experimental  systems  may  have  different  concordance 
rates  between  mRNA  and  protein  expression,  use  of  different 
methods  for  quantitative  protein  profiling  (ICAT-MS/MS  versus 
two-dimensional  gel-MS/MS)  and  mRNA  expression  profiling 
(MPSS  versus  SAGE)  may  also  account  for  the  differences.  It  is 
also  critical  to  use  only  those  data  with  high  confidence  levels  in 
the  comparisons  between  mRNA  and  protein  levels.  The 
expression  levels  obtained  by  MPSS  are  more  accurate  than 
those  obtained  by  SAGE  or  DNA  microarravs  because  of  the 
deep  sequence  coverage  MPSS  achieves.  We  have  also  limited 
our  data  set  to  only  those  proteins  (649  of  them)  that  were 
identified  in  multiple  peptide  hits  and  in  which  the  ICAT  ratios 
did  not  vary  greatly  among  different  peptides  from  the  same 
protein  (SD  <  0.5).  Such  variation  could  derive  from  experimen¬ 
tal  errors  or  from  different  protein  isoforms.  There  are  a 
multiplicity  of  post-transcriptional  mechanisms  that  have  been 
described  and  there  are  probably  more  to  be  identified  (41).  The 
important  point  is  that  this  major  aspect  of  control  could  not 
have  been  identified  without  the  integration  of  two  data  types— 
mRNAs  and  proteins. 

The  systems  approach  provides  powerful  new  approach  to 
diagnostics.  The  idea  is  that  disease-perturbed  networks  change 
their  patterns  of  mRNA  and  protein  expression  both  within  the 
diseased  cells  and  in  terms  of  the  proteins  they  synthesize  that 
are  secreted  into  the  blood.  Of  the  1,987  mRNAs  that  changed  in 
the  transition  from  LNCaP  to  CL1  cells  (eariy-stage  to  late-stage 
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cancer),  460  (23.2%)  encoded  proteins  that  were  potentially 
secreted  (Supplementary  Table  S6).  Sixteen  of  these  putative 
secreted  proteins  were  also  identified  to  be  differentially 
expressed  in  these  two  cell  states  by  the  ICAT  approach 
(Supplementary  Table  S6).  Of  the  190  differentially  expressed 
proteins  identified  by  the  ICAT  approaches,  22  were  predicted  to 
be  secreted  proteins  (Supplementary  Table  S6).  These  proteins  are 
excellent  candidates  for  investigation  as  diagnostic  markers  for 
prostate  cancer  progression.  The  interesting  point  is  that  these 
secreted  diagnostic  markers  will  serve  as  surrogates  for  the  state 
of  the  corresponding  protein  and  gene  regulatory  networks  and 
potentially  will  enable  one  to  (a)  stratify  disease  into  distinct 
categories  (e.g.,  relatively  benign,  slowly  invasive,  and  rapidly 
metastatic  for  prostate  cancer),  for  these  different  types  of 
prostate  cancer  will  employ  different  disease-perturbed  networks); 
( b )  follow  progression;  (c)  follow  response  to  therapy;  and  (d) 
monitor  adverse  drug  reactions.  The  other  interesting  possibility 
is  that  the  perturbed  secreted  proteins  will  serve  as  markers  to 
identify  the  primary  disease-perturbed  networks  and  accordingly 
will  identify  networks  that  may  harbor  excellent  protein 
candidates  for  drug  targeting— drug  targets  that  may  kill  disease 
cells  specifically  or  return  the  networks  to  a  more  normal  state. 

Interestingly,  these  two  states  of  prostate  cancer  progression 
can  lead  to  "digital  changes"  (i.e.,  changes  from  0  to  >50  tpm). 
Thus,  one  can  possibly  obtain  diagnostic  markers  that  are  digital 
in  the  sense  that  they  transition  from  no  expression  to  some 
expression.  In  the  transition  from  LNCaP  cells  to  CL1  cells,  there 
are  175  signatures  (169  mRNAs)  that  go  from  0  to  >50  tpm. 


Likewise,  in  going  from  CL1  cells  to  LKCaP  cells,  there  are  131 
signatures  (128  mRNAs;  Supplementary  Table  S2).  Among  the 
transcription  factors  we  identified,  eight  transcription  factors 
changed  from  0  tpm  in  LNCaP  to  >50  tpm  in  CL1  ceils  and 
seven  transcription  factors  changed  from  >50  tpm  in  LNCaP 
cells  to  0  tpm  in  CL1  cells  (Supplementary  Table  S3).  Eight 
pathways  were  affected  by  the  “digital  changes”  (Supplementary 
Table  S7).  For  example,  acid  ceramidase  1  and  aspartate 
aminotransferase  changed  from  >50  tpm  in  LNCaP  ceils  to  0 
tpm  in  CL1  cells,  affecting  multiple  pathways,  including  the 
insulin-like  growth  factor-!  receptor  pathway  and  activation  of 
COOH-terminal  Srk  kinase  pathway  (Supplementary  Table  S7). 
It  will  be  interesting  to  test  these  potential  digital  diagnostic 
markers. 

Our  analyses  provide  an  excellent  database  and  powerful 
resource  enabling  the  development  of  tools  for  multivariable 
diagnosis  and  prognosis.  They  represent  a  significant  step  toward 
a  system-wide  understanding  of  prostate  cancer  progression.  The 
systems  approach  to  disease  will  offer  powerful  to  approaches  to 
diagnostics,  therapeutics,  and  even  prevention  in  the  future  (42). 
it  will  almost  certain  usher  in  an  era  of  predictive  and  preventive 
medicine  over  the  next  10  to  20  years  (43). 
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